Fault-Tolerant Middleware and the Magical 1%
نویسندگان
چکیده
Through an extensive experimental analysis of over 900 possible configurations of a fault-tolerant middleware system, we present empirical evidence that the unpredictability inherent in such systems arises from merely 1% of the remote invocations. The occurrence of very high latencies cannot be regulated through parameters such as the number of clients, the replication style and degree or the request rates. However, by selectively filtering out a "magical 1%" of the raw observations of various metrics, we show that performance, in terms of measured end-to-end latency and throughput, can be bounded, easy to understand and control. This simple statistical technique enables us to guarantee, with some level of confidence, bounds for percentile-based quality of service (QoS) metrics, which dramatically increase our ability to tune and control a middleware system in a predictable manner.
منابع مشابه
A Middleware for Constructing Highly Available, Fault Tolerant, and Attack Tolerant Services
This paper describes the design of a middleware that provides support for constructing highly available, secure, fault-tolerant, and attack-tolerant services. The central component of this middleware is a group communication service that comprises of six network protocols: atomic broadcast, group membership, failure detection, attack detection, group access control, and secure intermember commu...
متن کاملTowards Middleware for Fault-Tolerance in Distributed Real-Time and Embedded Systems
Distributed real-time and embedded (DRE) systems often require support for multiple simultaneous quality of service (QoS) properties, such as real-timeliness and fault tolerance, that operate within resource constrained environments. These resource constraints motivate the need for a lightweight middleware infrastructure, while the need for simultaneous QoS properties require the middleware to ...
متن کاملFault Tolerant Middleware for Agent Systems: A Refinement Approach
Agent technology offers a number of advantages over traditional distributed systems, such as asynchronous communication, anonymity of individual agents and ability to change operational context. However, it is notoriously difficult to ensure dependability of agent systems. In this paper we present a formal approach for the top-down development of fault tolerant middleware for agent systems. We ...
متن کاملA study of unpredictability in fault-tolerant middleware
In enterprise applications relying on fault-tolerant middleware, it is a common engineering practice to establish service-level agreements (SLAs) based on the 95th or the 99th percentiles of the latency, to allow a margin for unexpected variability. However, the extent of this unpredictability has not been studied systematically. We present an extensive empirical study of unpredictability in 16...
متن کاملMission Statement: ToleranceZone A Self-Stabilizing Middleware for Wireless Sensor Netzworks
Wireless sensor networks (WSN) can be used in a wide range of monitoring and controlling applications. These networks consist of nodes with sparse resources, which makes application implementation challenging. Therefore, many middleware systems were developed in the last decade. Furthermore, unattended and long-living deployments of WSNs need fault-tolerant software architectures. The goal of t...
متن کامل